Non - Linear Algorithms for Parametric Markov Programming
نویسندگان
چکیده
General idea of nonlinear correcting factors that was successfully applied to accelerate and derive algorithms for many linear and nonlinear problems is used to develop several efficient procedures for solving linear systems of equations based on the Jacobi algorithm. Results of the theoretical study and extensive computer experiments for stochastic matrices are presented and analyzed to determine the conditions under which the studied algorithms converge, and the areas of their maximum convergence rate are identified. Applications of the algorithms to Markov decision processes are discussed. Many problems in computer science and operations research can be formulated as Markov Decision Processes (MDP) [1]. Examples include routing, optimal stopping, target, replacement, maintenance and repair, and inventory problems as well as optimal control of queues and stochastic scheduling. The utility function of total expected discounted rewards is commonly used in MDPs with finite and action spaces [1]. In this case, the optimality equation takes the following form: ( ) ( ) max V R S V α α ω α = ⎡ + ⎤ ⎣ ⎦ , (1) where V is a value (state) vector, ( ) R α is a reward vector, ω is a scalar discount factor, ( ) S α is a transition probability matrix, and α is a policy (control vector). The objective is to find the optimal policy α and corresponding value vector V, which represents the maximum expected discounted sum of future rewards. The main approaches to solving problem (1) are policy iteration, value iteration, and linear programming [1]. Both policy iteration and value iteration methods strongly depend on the efficiency of algorithms to solve a linear system of equations (LSE). Policy iteration algorithm solves equation ( ) ( ) ( ) ( ) i i V R S V α ω α = + , (2) for every iteration i of policy α . Value iteration algorithm calculates estimates for next value vector iterates using equation ( ) ( ) ( ) ( ) 1 max i i V R S V α α ω α + ⎡ ⎤ = + ⎣ ⎦ . (3) The main approaches to accelerate the convergence of algorithm (3) are similar to those for traditional LSE: Gauss-Seidel, Successive Overrelaxation (SOR), etc. [1]. As a result, faster LSE algorithms can help accelerate existing optimization algorithms and broaden the application area of MDPs. Mathematically a linear system of equations can be formulated as follows: x = Ax + b, (4) where A is a known coefficient matrix, b is a known vector, and x is an unknown solution vector. Two major directions to solving LSE are commonly recognized: direct and iterative [2]. Direct methods [3] often involve factorization, such as Gaussian elimination, and forward and backward substitution on the vector b.
منابع مشابه
On the optimization of Dombi non-linear programming
Dombi family of t-norms includes a parametric family of continuous strict t-norms, whose members are increasing functions of the parameter. This family of t-norms covers the whole spectrum of t-norms when the parameter is changed from zero to infinity. In this paper, we study a nonlinear optimization problem in which the constraints are defined as fuzzy relational equations (FRE) with the Dombi...
متن کاملPresentation and Solving Non-Linear Quad-Level Programming Problem Utilizing a Heuristic Approach Based on Taylor Theorem
The multi-level programming problems are attractive for many researchers because of their application in several areas such as economic, traffic, finance, management, transportation, information technology, engineering and so on. It has been proven that even the general bi-level programming problem is an NP-hard problem, so the multi-level problems are practical and complicated problems therefo...
متن کاملA new non-parametric approach for suppliers selection
In this paper we propose a simple non-parametric model for multiple crite-ria supplier selection problem. The proposed model does not generate a zeroweight for a certain criterion and ranks the suppliers without solving the modeln times (one linear programming (LP) for each supplier) and therefore allowsthe manager to get faster results. The methodology is illustrated using anexample.
متن کاملA Parametric Approach for Solving Multi-Objective Linear Fractional Programming Phase
In this paper a multi - objective linear fractional programming problem with the fuzzy variables and vector of fuzzy resources is studied and an algorithm based on a parametric approach is proposed. The proposed solving procedure is based on the parametric approach to find the solution, which provides the decision maker with more complete information in line with reality. The simplicity of the ...
متن کاملAn Application of the ABS LX Algorithm to Multiple Sequence Alignment
We present an application of ABS algorithms for multiple sequence alignment (MSA). The Markov decision process (MDP) based model leads to a linear programming problem (LPP), whose solution is linked to a suggested alignment. The important features of our work include the facility of alignment of multiple sequences simultaneously and no limit for the length of the sequences. Our goal here is to ...
متن کاملLinear programming on SS-fuzzy inequality constrained problems
In this paper, a linear optimization problem is investigated whose constraints are defined with fuzzy relational inequality. These constraints are formed as the intersection of two inequality fuzzy systems and Schweizer-Sklar family of t-norms. Schweizer-Sklar family of t-norms is a parametric family of continuous t-norms, which covers the whole spectrum of t-norms when the parameter is changed...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007